Cross Language Information Retrieval: an Experiment in Bilingual News Article Alignment from the Internet using MT
نویسندگان
چکیده
Cross Language Information Retrieval (CLIR) o ers the potential for users to search document collections in foreign languages. This is particularly relevant now that the Internet has become a global information source. Machine translation (MT) has a key role in bridging the gap between the language of the users' query and that of the document collection as well as to help the user understand the search results with gisting. In this paper we reformulate the CLIR task as text alignment on a database of Reuter news articles. We show preliminary results for CLIR using relevance feedback with machine-translated queries from Japanese into English.
منابع مشابه
How Similar are Chinese and Japanese for Cross-Language Information Retrieval?
For NTCIR Workshop 5 UC Berkeley participated in the bilingual task of the CLIR track. Our focus was on Chinese topic searches against the Japanese News document collection, and on Japanese topic search against the Chinese News Document Collection. Extending our work of NTCIR 4 workshop, we performed search experiments to segment and use Chinese search topics directly as if they were Japanese t...
متن کاملMachine Translation versus Dictionary Term Translation - A Comparison for English-Japanese News Article Alignment
Bilingual news article alignment methods based on multilingual information retrieval have been shown to be successful for the automatic production of so-called noisy-parallel corpora. In this paper we compare the use of machine translation (MT) to the commonly used dictionary term lookup (DTL) method for Reuter news article alignment in English and Japanese. The results show the trade-off betwe...
متن کاملCross - lingual Information Retrieval Model based on Bilingual Topic Correlation ⋆
How to construct relationship between bilingual texts is important to effectively processing multi-lingual text data and cross language barriers. Cross-lingual latent semantic indexing (CL-LSI) corpus-based doesnot fully take into account bilingual semantic relationship. The paper proposes a new model building semantic relationship of bilingual parallel document via partial least squares (PLS)....
متن کاملThe Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval
Bilingual term lists are extensively used as a resource for dictionary-based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written in one natural language based on queries that are expressed in another. This paper identifies eight types of terms that affect retrieval effectiveness in CLIR applications through their coverage by general-purpose bilingual term...
متن کاملUsing Uplug and SiteSeeker to construct a cross language search engine for Scandinavian languages
This paper presents how we adapted a website search engine for cross language information retrieval, using the Uplug word alignment tool for parallel corpora. We first studied the monolingual search queries posed by the visitors of the website of the Nordic council containing six different languages. In order to compare how well different types of bilingual dictionaries covered the most common ...
متن کامل